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Abstract — ILLIAC II is a general purpose computer built at the 
University of Illinois, Urbana. It contains about 55 000 transistors 
and has a floating multiply time of 6.3 /j.s. A number of features are 
provided to increase the speed of operation. There are three controls 
in largely concurrent operation. The control circuits are largely 
asynchronous and speed independent. The floating point arithmetic 
unit contains two adders and utilizes redundant number representa- 
tion and separate carry storage. The memory hierarchy has members 
ranging from the 10-word, 0.2-jus. flow gating memory to mag- 
netic tapes. Order fetches from the main 1.8 -jus. core memory are 
minimized by packing two to four orders per word, and by holding 
two words of orders in the flow gating memory for execution of short 
loops. The bibliography lists 40 papers related to the design of this 
computer. 

ILLIAC II is a large, high-speed, general-purpose 
computer built by the Digital Computer Labora- 
tory, 1 University of Illinois, Urbana. Comprehen- 
sive plans for its construction were given in a widely 
quoted 1957 report [38]. No similarly comprehensive 
post-construction report exists, although a number of 
papers describing various aspects of the computer have 
been published. This bibliography lists these papers and 
provides a short description of the computer as a guide 
to the entries. 

The papers cited fall into two classes. The first class 
is the open literature consisting of journal articles, sym- 
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posia proceedings, and the like. The second class is 
Digital Computer Laboratory Reports. These are in- 
cluded because they are rather widely held in the librar- 
ies of computer organizations and they have been cited 
in the literature. The internal Digital Computer Labora- 
tory documents related to construction are not cited 
here. 

History 

Planning for ILLIAC II began June 1, 1956, and 
culminated in 1957 in a report describing the proposed 
design [38]. Design began in 1957 and final chassis con- 
struction began in 1960. In 1962, the two controls, arith- 
metic unit and core memory began operation with paper 
tape input and output. At the present time the machine 
is essentially complete and in use, and work continues 
on the addition of input-output devices and other pe- 
ripheral equipment. The work has been supported jointly 
by the University of Illinois, the Atomic Energy Com- 
mission and the Office of Naval Research. The IBM 
Corporation donated a number of input-output devices. 

Organization 

ILLIAC II is a highly parallel computer, with three 
simultaneously operating controls. Operations of the 
floating-point arithmetic unit are controlled by an 
arithmetic control. Transfer of data between the core 
memory and the slower memories is controlled by an 
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interplay control. Other control functions are performed 
by a supervisory control called Advanced Control. 
Among the functions of Advanced Control are fetching 
and storing of operands, address construction and index- 
ing, and partial decoding of the orders for the other two 
controls. The Advanced Control order code is rather 
elaborate, and in conjunction with the 13-bit registers 
in the fast memory it provides for a large variety of 13- 
bit fixed-point arithmetic and logical operations, except 
multiplication and division. 

The hierarchy of memories consists of the following: 

1) Fast transistor memory, 10 words, 0.2 jus. 

2) Core memory, 8192 words (soon to be 12 288 
words), 1.8 fj,s. 

3) Drum memory 65 536 words, 8.5-ms average ac- 
cess time, 7.8-jus word period. 

4) Magnetic tapes and disk files. 

The order code contains long and short instructions. 
A 13-bit short instruction, which occupies only a quarter 
word, contains four bits to specify an index register con- 
taining an address or a fast register containing an oper- 
and. A 26-bit long instruction contains in addition a 
13-bit address. Long instructions may be packed two to 
a word. Two words of orders are held in the fast mem- 
ory. This makes it possible to execute a loop of up to 
eight short instructions (two words) without any in- 
struction fetches from the core memory. If at the same 
time the operands are held in the ten-word fast memory, 
a very fast loop can be written. 

A detailed consideration of the size and speed require- 
ments of the various parts of the machine for several 
classes of problems is given in Taub, et al. [38], which 
also contains an early version of the order code. Con- 
sideration of problem types is also contained in Taub 
[39]. More detailed descriptions of the organization and 
the order code are contained in Gillies [8], [9], [ll j. 
Up to date details are given in the ILLIAC II pro- 
grammer's manual [5]. 

Arithmetic Unit 

The arithmetic unit is asynchronous, double-preci- 
sion, floating-point. It is radix 4 in almost all respects. 
Single-precision operands are 52-bits long, with a 45-bit 
fraction and a 7-bit exponent (base 4) in radix comple- 
ment representation. The range of normalized single- 
precision numbers in the memory is 

4-64 < < 4 63 t 

Results of most arithmetic operations are not normal- 
ized and the programmer is free to normalize or not as 
he stores them. To aid in fixed-point programming, 
orders are provided which force the exponent to one of 
three values, with corresponding shifts in the fraction 
part. The roundoff which occurs when storing a double- 
precision arithmetic result in the single-precision mem- 
ory is obtained by adding 1 or to the last retained 



fraction bit for discarded fractions greater or less than 
one half, respectively. The equality case is made de- 
pendent on the (presumably random) last retained bit 
to produce an unbiased roundoff. 

A number of features are provided to increase the 
speed of operation. Redundant number representations 
and separate carry storage are used within part of the 
arithmetic unit to eliminate carry propagation during 
repeated additions such as occur in multiplication. In 
general a carry bit is provided for each two fraction bits. 
Multiplier digits, originally having values 0, 1, 2, 3 are 
recoded to the range — 1, 0, 1, 2 and two-at-a-time shifts 
are provided. Two adders are provided so that addition 
may be performed both while gating from the accumu- 
lator (A, Q) to the temporary accumulator (S, R) and 
vice versa. Radix 4 division was considered by Robert- 
son [30], but rejected in favor of redundant binary non- 
restoring division, wherein the quotient digits are gen- 
erated as —1, 0, +1 and then recoded as base 4 digits 
with values between —3 and +3. Carries are assimilated 
before a store, since the other parts of the computer do 
not use redundant number representation. 

The floating-point arithmetic unit as constructed is 
described theoretically in Robertson [31 ] and in detail 
in Penhollow [19]. Earlier plans were described in Taub, 
et al. [38] and Wheeler [40 ]. In addition there were a 
number of earlier studies. These included redundant 
number representations by Avizienis [l], [2], [3] and 
Metze [14], use of redundant number representation in 
the whole computer instead of just the arithmetic unit 
by Metze and Robertson [15], separate carry storage 
adders by Takahashi [37], efficient multiplier and divi- 
sion recodings by Penhollow [18], and efficient division 
by Robertson [30], Metze [16] and Shively [34]. 

Speed Independence and Control Design 

Theories of asynchronous circuits and speed inde- 
pendence were studied extensively prior to construction. 
The speed independence problem is stated physically 
and theoretically in Taub, et al. [38]. Detailed theoreti- 
cal studies are in Muller and Bartky [17], Shelly [33], 
and Bartky [4]. A circuit is speed-independent if its 
function does not depend on the speeds at which its 
constituent parts operate. Advantages of speed inde- 
pendence are increased reliability and ease of main- 
tenance. 

The realization of speed independence used in the 
controls of ILLIAC II involves the collection of reply 
signals to insure that all the operations which must be 
performed at each step are complete before going on to 
the next step. Some of the problems involved in design- 
ing the arithmetic control in this way are described in 
Swartwout [35], Robertson [32], and Gillies [l0]. Ad- 
vanced Control was designed in a similar but not identi- 
cal way. The arithmetic unit was not made speed- 
independent in order to avoid increasing its complexity 
and cost and decreasing its speed. The electromechani- 
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cal peripheral devices are inherently synchronous, but 
the philosophy of speed independence was partly ex- 
tended to them by the provision of replies and alarms 
for many of the control signals. 

A theoretical study of methods of designing a speed- 
independent control, including the method actually 
used for the arithmetic control, is contained in Swart- 
wout [36]. 

Speeds 

Some approximate operation times are as follows. 

Floating add or subtract 2.5 to 3.5 juts 

Floating multiply 6.3 /xs 

Floating divide 16.0 /xs 

Indexing 1 .0 /*s 

13-bit integer orders 2.0 /xs 

Fast memory . 2 /xs 

Core memory 1 . 8 /xs 

The times shown for arithmetic do not include in- 
struction or operand accessing times because Advanced 
Control performs memory accesses concurrently with 
arithmetic, usually with zero-net time charges. Instruc- 
tion decoding, address construction and indexing are 
similarly overlapped with arithmetic, and most absorb 
no effective time at all. 

Fast Memory 

Ten words of very fast storage are provided, called 
the fast memory or flow-gating memory. These ten 
registers are composed of transistor flip-flops with com- 
mon input and output buses and special gating arrange- 
ments to keep the number of transistors small. The de- 
sign achieves high speed and high sensitivity along with 
the usually contradictory high stability by using varia- 
ble feedback. During the write-in operation, a gate sig- 
nal lowers the average potential of the flip-flop. This 
produces the following two effects: 1) information is 
allowed to flow into the circuit through a diode from the 
input bus and 2) the feedback in the flip-flop is disabled. 
This reduces the circuit to a difference amplifier, and the 
information is stored in the base-emitter capacitances. 
At the end of the write-in operation the average poten- 
tials are raised back to normal, thus cutting off the input 
diode and allowing the feedback to permanently store 
the information. The operation time is 0.2 /xs. The 
transistor counts per bit are basic flip-flop 2, output 
driver 1, write and read drivers and terminations about 
2.3. 

The fast memory sits at the "crossroads" of the com- 
puter, and some of its registers are also intimately 
identified with other parts of the machine, e.g., the core 
memory, Advanced Control, the arithmetic control, 
and the arithmetic unit. Four of the fast registers are 
also addressable as quarter words, thus providing 16 
registers of 13 bits each for use as index registers and for 
other purposes. 



The early plans for the fast memory were given in 
Taub, et al. [38], and Poppelbaum [20]- [22]. A brief 
mention is also made in Poppelbaum [24]. Detailed 
experimental data on the fast memory, including toler- 
ance analyses, waveforms and other details, is given in 
Guckel, Kunihiro and Crow [12]. A patent covering the 
flow-gating principle was issued in 1962 [25]. 

Core Memory ; 

The core memory was originally planned to contain 
8192 words of 52 bits plus parity each. There were to be 
two 4096-word modules, with odd addresses in one 
module and even addresses in the other to halve the 
average access time for sequential addresses. The first 
4096-word module was completed in 1962. It was word 
oriented, with one switch core per word and two data 
cores per bit. Two data cores per bit gave bipolar output 
and a loading on the switch cores that was virtually 
independent of the digit pattern. Partial switching was 
used to increase speed and reduce core heating. Readout 
was destructive and a restoration cycle was provided. 

Early plans for the core memory were described in 
Taub, et al. [38]. Some earlier experiments were re- 
ported in McKay, et al. [13]. Detailed plans for the 
construction of the first 4096-word module were de- 
scribed in Ray [27]. Theoretical studies of partial 
switching are contained in Ray [28], [29]. 

The first 4096-word module was finished in 1962, and 
has been in operation since then (without the interleaved 
addresses feature) at a cycle time of 1.8 /xs. In 1964, a 
commercial 8192-word core memory was purchased. The 
original 4096-word module and 4096 words of the com- 
mercial core memory are now in operation with inter- 
leaved addresses. This exhausts the addressing capabil- 
ities of the original 13-bit address field. The addressing 
scheme is presently being modified to allow the addi- 
tional 4096 words also to be used. 

Circuits 

The basic circuits used in the high-speed portions of 
the machine are nonsaturating current switching cir- 
cuits using pnp germanium mesa transistors. Switching 
times are 10 to 40 ns. Early reports on these circuits are 
Taub, et al. [38] and Poppelbaum and Wiseman [22]. 
The actual construction was based on a revised design 
completed in the summer of 1960. A patent covering the 
asymmetrical flip-flop was issued in 1960 [23]. A tuto- 
rial description of some of the memory elements is in Rao 
[26]. 

The slower parts of the computer (Interplay, Drum 
Memory, Input-output Channels, etc.) contain a vari- 
ety of slower circuits. These include saturating, non- 
saturating, current switching, and nor topologies using 
germanium transistors. 

The computer contains about 55 000 transistors and 
133 000 diodes, exclusive of the commercially built 
input-output devices. 
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Input-Output and Interrupt 

Two input-output systems are provided, a high capac- 
ity full word system and a slower quarter word system. 

Full word data transfers in the memory hierarchy are 
between the core memory and one of the other memories 
or devices. Transfers between the core memory and the 
ten-word fast memory are supervised by Advanced Con- 
trol. All other full word transfers are performed by 
Interplay, which contains the necessary controls and 
data buffers. Interplay is a wired program computer of 
a limited sort. It begins a data transfer between the core 
memory and one of the other memories or devices in 
response to a command from Advanced Control. After 
the initial setup, Advanced Control and Interplay oper- 
ate independently without interaction except that they 
compete for core memory accesses. Each of the Interplay 
Channels can be performing a transfer at the same time. 
Currently there are nine channels in use out of a possible 
32. The capacity of Interplay is one word every 3.5 /as. 

The slower input-output system, called the special 
register system, allows Advanced Control to exchange 
13-bit characters with up to 64 input-output registers. 
Each 13-bit transfer requires Advanced Control to exe- 
cute one order as distinguished from Interplay which 
operates in parallel with Advanced Control and requires 
execution of only two Advanced Control orders to 
transfer a block of data, generally 256 words. The spe- 
cial register system is used for low-speed input-output 
and to transmit control and status information for 
peripheral devices. 

An interrupt system is connected to certain bits of the 
special registers. For example, when an Interplay chan- 
nel completes the transfer of a block of data, a comple- 
tion signal is provided via one of the special registers. 
This may, if desired, interrupt the program then running 
and call a supervisory program to initiate another trans- 
fer or take other action. The interrupt system may also 
be actuated by errors, power failures, requests from 
consoles, expiration of a time interval, etc. 

Magnetic Drum Memory 

The Magnetic Drum Memory stores 65 536 words on 
two 3400-r/min drums. Each word is stored as four 13- 
bit characters plus parity. The character period is 1.95 
/jls; the word period is 7.8 /jls. Nonreturn-to-zero record- 
ing is used at a packing density of 288 bits per inch. Full 
52-bit parallel recording with a 1.95-/xs word period was 
considered but not used because it would have required 
four times as many read and write amplifiers and it 
would have almost completely occupied the core mem- 
ory while a drum transfer was in progress. Drum data 
is written and read in 256-word blocks, with eight 
blocks per band, and 16 bands per drum. Gaps between 
the blocks allow for head switching so that following 
any block transfer, random access to one of the 16 
blocks in the next sector may be obtained without wait- 
ing. 



System Programs 

The ILLIAC II software includes an assembler called 
NICAP, a FORTRAN II translator, and an operating 
system program. Among other things, NICAP handles 
the multiple-orders-per-word problem and translates 
complex address field expressions, including nested 
parentheses to any depth. Parts of address field expres- 
sions which can be evaluated at translation time are so 
evaluated. The remaining additions and subtractions 
are prepared for execution at run time by the 13-bit 
fixed-point arithmetic unit in Advanced Control ; multi- 
plications and divisions are prepared for execution by 
the floating-point arithmetic unit. The address field 
compilation algorithm is described in Gear [6]. 

The FORTRAN II translator produces assembly 
language in a single pass. Effective use of the Drum 
Memory enables the translator to proceed without the 
use of magnetic tapes, thus gaining an order of magni- 
tude in speed. The operating system program provides 
for batch processing. The various system and library 
programs are described in a user's manual [5] and in a 
compiler writer's manual [7]. 
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